On the equivalence between Kolmogorov-Smirnov and ROC curve metrics for binary classification

نویسندگان

  • Paulo J. L. Adeodato
  • Sílvio B. Melo
چکیده

Binary decisions are very common in artificial intelligence. Applying a threshold on the continuous score gives the human decider the power to control the operating point to separate the two classes. The classifier’s discriminating power is measured along the continuous range of the score by the Area Under the ROC curve (AUC_ROC) in most application fields. Only finances uses the poor single point metric maximum Kolmogorov-Smirnov (KS) distance. This paper proposes the Area Under the KS curve (AUC_KS) for performance assessment and proves AUC_ROC = 0.5 + AUC_KS, as a simpler way to calculate the AUC_ROC. That is even more important for ROC averaging in ensembles of classifiers or n-fold cross-validation. The proof is geometrically inspired on rotating all KS curve to make it lie on the top of the ROC chance diagonal. On the practical side, the independent variable on the abscissa on the KS curve simplifies the calculation of the AUC_ROC. On the theoretical side, this research gives insights on probabilistic interpretations of classifiers assessment and integrates the existing body of knowledge of the information theoretical ROC approach with the proposed statistical approach based on the thoroughly known KS distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Equivalência entre a Área sob a Curva Kolmogorov-Smirnov e o Índice de Gini na Avaliação de Desempenho de Decisões Binárias

This paper proposes and proves the important equivalence between the Gini index and the area under the Kolmogorov-Smirnov (KS) distribution curve. The proof’s rationale is similar to that used in the proof of equivalence between AUC_ROC and AUC_KS. But different from that, this one uses a transformation that preserves the 1-to-1 correspondence between the ideal classifier on the KS and Lorenz c...

متن کامل

ROC curve equivalence using the Kolmogorov-Smirnov test

This paper describes a simple, non-parametric and generic test of the equivalence of Receiver Operating Characteristic (ROC) curves based on a modified Kolmogorov-Smirnov (KS) test. The test is described in relation to the commonly used techniques such as the Area Under the ROC curve (AUC) and the Neyman-Pearson method. We first review how the KS test is used to test the null hypotheses that th...

متن کامل

Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining

This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories ́ histogram trained on a small exclusive labelled sampl...

متن کامل

Evaluation of Analytical Methods for Connectivity Map Data

Connectivity map data and associated methodologies have become a valuable tool in understanding drug mechanism of action (MOA) and discovering new indications for drugs. However, few systematic evaluations have been done to assess the accuracy of these methodologies. One of the difficulties has been the lack of benchmarking data sets. Iskar et al. (PLoS. Comput. Biol. 6, 2010) predicted the Ana...

متن کامل

Optimal thresholds criteria for ROC surfaces

Consider the ROC surface which is a generalization of the ROC curve for three−class diagnostic problems. In this work, we propose five criteria for the three−class ROC surface by extending the Youden index, the sum of sensitivity and specificity, the maximum vertical distance, the amended closest-to-(0,1) and the true rate. It may be concluded that these five criteria can be expressed as a func...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1606.00496  شماره 

صفحات  -

تاریخ انتشار 2016